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Full text available- ^ pdf(621 59 KB) Additional Information: full citation , abstract , references , citings , index 

terms 

This paper studies the problem of categorical data clustering, especially for transactional 
data characterized by high dimensionality and large volume. Starting from a heuristic 
method of increasing the height-to-width ratio of the cluster histogram, we develop a 
novel algorithm - CLOPE, which is very fast and scalable, while being quite effective. We 
demonstrate the performance of our algorithm on two real world datasets, and compare 
CLOPE with the state-of-art algorithms. 
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Bases, volume 11 Issue 2 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^ pdf(212.88 KB) Additional Information: full citation , abstract , citings , index terms 

Several organizations have developed very large market basket databases for the 
maintenance of customer transactions. New applications, e.g., Web recommendation 
systems, present the requirement for processing similarity queries in market basket 
databases. In this paper, we propose a novel scheme for similarity search queries in 
basket data. We develop a new representation method, which, in contrast to existing 
approaches, is proven to provide correct results. New algorithms are proposed for the 

Keywords: Data mining. Market basket data. Nearest-neighbor, Similarity search 
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Publisher: ACM Press 

Full text available: ^ pdf(367.51 KB) Additional Information; full citation , abstract , references . Index terms 

It Is widely recognized that developing efficient and fully automated algorithms for 
clustering large transactional datasets is a challenging problem. In this paper, we propose 
a fast, memory-efficient, and scalable clustering algorithm for analyzing transactional 
data. Our approach has three unique features. First, we use the concept of Weighted 
Coverage Density as a categorical similarity measure for efficient clustering of 
transactional datasets. The concept of weighted coverage density is in ... 

Keywords: AMI, LISR, SCALE, weighted coverage density 

Probabilistic modeling of transaction data with applications to profiling, visualization. 
and prediction 

Igor V. Cadez, Padhraic Smyth, Heikki Mannila 

August 2001 Proceedings of the seventh ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '01 

Publisher: ACM Press 

Full text available* iTl pdf{872 07 KB) A^^'^'O"^' Information: full citation , abstract , references , citings , index 
• [M^ terms 

Transaction data is ubiquitous in data mining applications. Examples include market 
basket data in retail commerce, telephone call records in telecommunications, and Web 
logs of individual page-requests at Web sites. Profiling consists of using historical 
transaction data on individuals to construct a model of each individual's behavior. Simple 
profiling techniques such as histograms do not generalize well from sparse transaction 
data. In this paper we investigate the application of probabilisti ... 

Keywords: EM algorithm, mixture models, profiles, transaction data 
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Eul-Hong (Sam) Han, George Karypis 

October 2005 Proceedings of the 14th ACM international conference on Information 
and knowledge management CIKM '05 

Publisher: ACM Press 

Full text available: ^ pdfd 05.58 KB) Additional Information: full citation , abstract , references , index terms 

The explosive growth of the world-wide-web and the emergence of e-commerce has led to 
the development of recommender systems—a personalized information filtering 
technology used to identify a set of N items that will be of interest to a certain user. User- 
based and model-based collaborative filtering are the most successful technology for 
building recommender systems to date and is extensively used in many. commercial 
recommender systems. The basic assumption in these algorithms is ... 

Keywords: collaborative filtering, e-commerce, product features, recommender systems, 
web retailer 
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Mukund Deshpande, George Karypis 

January 2004 ACM Transactions on Information Systems (TOIS), Volume 22 issue i 
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Full text available- 1^ pdf(240.61 KB) ^^^^'^'O"*"' Information: full citation , abstract , references , citings , index 
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The explosive growth of the world-wide-web and the emergence of e-commerce has led to 
the development of recommender systems— a personalized information filtering 
technology used to identify a set of items that will be of interest to a certain user. User- 
based collaborative filtering is the most successful technology for building recommender 
systems to date and is extensively used in many commercial recommender systems. 
Unfortunately, the computational complexity of these methods grows I ... 

Keywords: e-commerce, predicting user behavior, world wide web 



SQLEM: fast clustering in SQL using the EM algorithm 
Carlos Ordonez, Paul Cereghini 

May 2000 ACM SIGMOD Record , Proceedings of the 2000 ACM SIGMOD international 

conference on Management of data SIGMOD '00, Volume 29 issue 2 
Publisher: ACM Press 

Full text available* pdf(1.07 MB) Additional Information: full citation , abstract , references , citings , index 
^ terms 

Clustering is one of the most important tasks performed in Data Mining applications. Tiiis 
paper presents an efficient SQL implementation of the EM algorithm to perform clustering 
in very large databases. Our version can effectively handle high dimensional data, a high 
number of clusters and more importantly, a very large number of data records. We 
present three strategies to implement EM in SQL: horizontal, vertical and a hybrid one. 
We expect this work to be useful for data mining programmer ... 

Industry/government track papers: Predi c tin g customer sho ppi n g lists from point-of- 
sale purchase data 

Chad Cumby, Andrew Fano, Rayid Ghani, Marko Krema 

August 2004 Proceedings of the tenth ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '04 

Publisher: ACM Press 

Full text available- pdf(286 61 KB) Additional Information: full citation , abstract , references , citings , index 
^ ' terms 

This paper describes a prototype that predicts the shopping lists for customers in a retail 
store. The shopping list prediction is one aspect of a larger system we have developed for 
retailers to provide individual and personalized interactions with customers as they 
navigate through the retail store. Instead of using traditional personalization approaches, 
such as clustering or segmentation, we learn separate classifiers for each customer from 
historical transactional data. This allows us to ma ... 

Keywords: POS data, applications, classification, machine learning 



Probabilistic query models for transaction data 
Dmitry Pavlov, Padhraic Smyth 

August 2001 Proceedings of the seventh ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '01 

Publisher: ACM Press 

Full text available- S_Ddf(958 33 KB) Additional Information: full citation , abstract , references , citings , index 
■^•"^ '' terms 

We investigate the application of Bayesian networks, Markov random fields, and mixture 
models to the problem of query answering for transaction data sets. We formulate two 
versions of the querying problem: the query selectivity estimation (i.e., finding exact 
counts for tuples in a data set) and the query generalization problem (i.e., computing the 
probability that a tuple will occur in new data). We show that frequent itemsets are useful 
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for reducing the original data to a compressed representa ... 

'lO Evolving data mining into solutions for insights: Business applications of data mining Q 
^ Chidanand Apte, Bing Liu, Edwin P. D. Pednault, Padhraic Smyth 

August 2002 Communications of the ACM, volume 45 issue 8 

Publisher: ACM Press 

Full text available: " g] pdfd 05.88 KB) Additional Information: full citation , abstract , references , citings , index 
^ html(27.21 KB) * terms 

They help identify and predict individual, as well as aggregate, behavior, as illustrated by 
four application domains: direct nnail, retail, autonnobile Insurance, and health care. 

A localized algorithm for parallel association mining Q 
Mohammed Javeed Zaki, Srinivasan Parthasarathy, Wei Li 

June 1997 Proceedings of the ninth annual ACM symposium on Parallel algorithms 
and architectures SPAA '97 

Publisher: ACM Press 

Full text available: ^ pdf(1.56 MB) Additional Information: full citation , references , citings , index terms 




''2 Cluster ensembles — a knowledge reuse framework for combining multiple partitions Q 
Alexander Strehl, Joydeep Ghosh 

March 2003 The Journal of Machine Learning Research, volume 3 
Publisher: MIT Press 

Full text available* ISI pdfC842 50 KB) A*^*^'*'^"^' Information: full citation , abstract , references , citings , index 
. ^§1 terms 

This paper introduces the problem of combining nnultiple partitionings of a set of objects 
into a single consolidated clustering without accessing the features or algorithms that 
determined these partitionings. We first identify several application scenarios for the 
resultant 'knowledge reuse' framework that we call cluster ensembles. The cluster 
ensemble problem is then formalized as a combinatorial optimization problem in terms of 
shared mutual information. In addition to a direct ... 

Keywords: cluster analysis, clustering, consensus functions, ensemble, knowledge reuse, 
multi-learner systems, mutual information, partitioning, unsupervised learning 

13 Evolving data mining into solutions for insights: Data-driven evolution of data mining Q 
algorithms 

Padhraic Smyth, Daryl Pregibon, Christos Faloutsos 
August 2002 Communications of the ACM, volume 45 issue 8 
Publisher: ACM Press 

Full text available: MM( 1Q6.77 KB) Additional Information: full citation , abstract, refer ences , citin gs, i ndex 
html(27.95 KB) terms 

Fundamentally, these algorithms are driven by the nature of the data being analyzed, in 
both scientific and commercial applications. 

14 Research track papers: Efficient closed pattern mining in the presence of tough block Q 
constraints 

Krishna Gade, Jianyong Wang, George Karypis 

August 2004 Proceedings of the tenth ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '04 

Publisher: ACM Press 
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Full text available: ^ pdf(288,81 KB) Additional Information: full citation , abstract , references , citings , index 

terms 

Various constrained frequent pattern mining problem formulations and associated 
algorithms have been developed that enable the user to specify various itemset-based 
constraints that better capture the underlying application requirements and 
characteristics. In this paper we introduce a new class of block constraints that determine 
the significance of an itemset pattern by considering the dense block that is formed by the 
pattern's items and its associated set of transactions. Block constr ... 

Keywords: block constraint, closed pattern, tough constraint 



''^ Data Mining with optinnized two-dimensional association rules ||| 




Takeshi Fukuda, Yasuhiko Morimoto, Shimichi Morishita, Takeshi Tokuyama 
June 2001 ACM Transactions on Database Systems (TODS), volume 26 issue 2 



Publisher: ACM Press 

Full text available* fi!|pdf(947 41 KB) Additional Information: full citation , abstract , references , citings , index 
^'^^'"^ terms 

We discuss data mining based on association rules for two numeric attributes and one 
Boolean attribute. For example, in a database of bank customers, Age and Balance are 
two numeric attributes, and CardLoan is a Boolean attribute. Taking the pair (Age, 
Balance) as a point in two-dimensional space, we consider an association rule of the form 
Age, Balance eP=:^ 

Keywords: association rules, convex hull searching, data mining, image segmentation, 
matrix searching 



Research track: Inverted matrix: efficient discovery of frequent items in large datasets jj 
in the context of interactive mining 
Mohammad El-Hajj, Osmar R. Zaiane 

August 2003 Proceedings of the ninth ACI^ SIGKDD international conference on 
Knowledge discovery and data mining KDD '03 

Publisher: ACM Press 

Full text available* l T?|pdf(198 31 KB) Additional Information: full citation , abstract , references , citings , index 
'^^^'^ terms 

Existing association rule mining algorithms suffer from many problems when mining 
massive transactional datasets. One major problem is the high memory dependency: 
either the gigantic data structure built is assumed to fit In main memory, or the recursive 
mining process is too voracious in memory resources. Another major Impediment is the 
repetitive and interactive nature of any knowledge discovery process. To tune 
parameters, many runs of the same algorithms are necessary leading to the building ... 

Keywords: COFI-tree, association rules, frequent patterns mining. Inverted matrix 



7 Poster p a pers: Distributed data mining in a chain store database of short transactions 
Cheng-Ru Lin, Chang-Hung Lee, Ming-Syan Chen, Philip S. Yu 

July 2002 Proceedings of the eighth ACI^ SIGKDD international conference on 
Knowledge discovery and data mining KDD '02 

Publisher: ACM Press 

Full text available: ^ pdf(635.33 KB) Additional Information: full citation , abstract , references , index terms 

In this paper, we broaden the horizon of traditional rule mining by introducing a new 
framework of causality rule mining in a distributed chain store database. Specifically, the 
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causality rule explored In this paper consists of a sequence of triggering events and a set 
of consequential events, and is designed with the capability of mining non-sequential, 
inter-transaction information. Hence, the causality rule mining provides a very general 
framework for rule derivation. Note, however, that the ... 

Contributed articles on online, interactive, and anytime data mining: Mining data 
streams under block evolution 



^ Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan 

January 2002 ACM SIGKDD Explorations Newsletter volume 3 issue 2 

Publisher: ACM Press 

Full text available: ^ pdfd.lO MB) Additional Information: full citation , abstract , references , citings 

In this paper we survey recent work on incremental data mining model maintenance and 
change detection under block evolution. In block evolution, a dataset is updated 
periodically through insertions and deletions of blocks of records at a time. We describe 
two techniques: (1) We describe a generic algorithm for model maintenance that takes 
any traditional incremental data mining model maintenance algorithm and transforms it 
into an algorithm that allows restrictions on a temporal su ... 

Implementing leap traversals of the itemset lattice 
Mohammad El-Hajj, Osmar R. Zaiane 

August 2005 Proceedings of the 1st international workshop on open source data 
mining: frequent pattern mining implementations OSDM '05 

Publisher: ACM Press 

Full text available: ^ pdf(427.15 KB) Additional Information: full citation , abstract , references 

The Leap-Traversal approach consists of traversing the item-set lattice by deciding on 
carefully selected nodes and avoiding systematic enumeration of candidates. We propose 
two ways to implement this approach. The first one uses a simple header-less frequent 
pattern tree and the second one partitions the transaction space using COFI-trees. In this 
paper we discuss how to avoid nodes in the lattice that would not participate in the 
answer set and hence drastically reduce the number of candidates ... 

20 B e y ond intratransaction association analysis: nnining multidimensional 
intertransaction association rules 
Hongjun Lu, Ling Feng, Jiawei Han 

October 2000 ACM Transactions on Information Systems (TOIS), volume 18 issue 4 
Publisher: ACM Press 

Full text available* 1f^pdf (1 31 MB) Additional Information: full citation , abstract , references , citings , index 
* i^^i"^^ ■ terms 

In this paper, we extend the scope of mining association rules fronn traditional single- 
dimensional intratransaction associations, to multidimensional intertransaction 
associations. Intratransaction associations are the associations among items with the 
same transaction, where the notion of the transaction could be the items bought by the 
same customer, the events happened on the same day, and so on. However, an 
intertransaction association ... 

Keywords: association rules, data mining, intra/intertransaction, multidimensional 
context 
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Full text available:* ^ pdf(363.32 KB) Additional Information: full citation , references , index terms 
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^ shopping assistants using individual consumer models 
^ Chad Cumby, Andrew Fano, Rayid Ghani, Marko Krema 

January 2005 Proceedings of the 10th international conference on Intelligent user 
interfaces lUI '05 

Publisher: ACM Press 

Full text available: gpdf(103.31 KB) Additional Information: full citation , abstract , references . Index terms 

This paper describes an Intelligent Shopping Assistant designed for a shopping cart 
nnounted tablet PC that enables Individual interactions with customers. We use machine 
learning algorithms to predict a shopping list for the customer's current trip and present 
this list on the device. As they navigate through the store, personalized promotions are 
presented using consumer models derived from loyalty card data for each inidvidual. In 
order for shopping assistant devices to be effective, we believ ... 

Keywords: classification, machine learning, retail applications 
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June 2004 Proceedings of the 9th ACM SIGMOD workshop on Research issues in data 
mining and knowledge discovery DMKD '04 
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Full text available: 'fS pdf(267. 02 KB) Additional Information: full citation, abstract, references 



The COFI approach for mining frequent itemsets, introduced recently, is an efficient 
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algorithm that was demonstrated to outperform state-of-the-art algorithms on synthetic 
data. For instance, COFI is not only one order of magnitude faster and requires 
significantly less memory than the popular FP-Growth, it is also very effective with 
extremely large datasets, better than any reported algorithm. However, COFI has a 
significant drawback when mining dense transactional databases which is the case ... 

24 Research track: CLOSET+: searchin g for the best strategies for minin g frequent | 
closed itemsets 

Jianyong Wang, Jiawei Han, Jian Pei 

August 2003 Proceedings of the ninth ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '03 

Publisher: ACM Press 

Full text available* 1^ Ddff492 93 KB) Additional Information: full citation , abstract , references , citings , index 
'^^'"^ ' terms 

Mining frequent closed itemsets provides complete and non-redundant results for frequent 
pattern analysis. Extensive studies have proposed various strategies for efficient frequent 
closed itemset mining, such as depth-first search vs. breadthfirst search, vertical formats 
vs. horizontal formats, tree-structure vs. other data structures, top-down vs. bottom-up 
traversal, pseudo projection vs. physical projection of conditional database, etc. It is the 
right time to ask "what are the pros and c ... 

Keywords: association rules, frequent closed itemsets, mining methods and algorithms 



25 Sequence Mining: Sliding-window filtering: an efficient algorithm for incremental 
mining 

Chang-Hung Lee, Cheng-Ru Lin, Ming-Syan Chen 

October 2001 Proceedings of the tenth international conference on Information and 
knowledge management CIKM '01 

Publisher: ACM Press 

Full text available: 1 llpdf(1.59 MB) Additional Information: full citation , abstract , references , citings, index 
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We explore in this paper an effective sliding-window filtering (abbreviatedly as SWF) 
algorithnn for incrennental mining of association rules. In essence, by partitioning a 
transaction database into several partitions, algorithnn SWF employs a filtering threshold 
in each partition to deal with the candidate itemset generation. Under SWF, the 
cumulative information of mining previous partitions Is selectively carried over toward the 
generation of candidate itemsets for the subsequent partitions. Alg ... 

Keywords: association rules, data mining, incremental mining, time-variant database 
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Publisher: ACM Press 

Full text available: - il pdf(79074 KB) Additional Information: full citation , abstract, references , dtioas. index 
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This paper introduces FAST, a novel two-phase sampling-based algorithm for discovering 
association rules in large databases. In Phase I a large initial sample of transactions is 
collected and used to quickly and accurately estimate the support of each individual item 
in the database. In Phase II these estimated supports are used to either trim "outlier" 
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transactions or select "representative" transactions from tine initial sample, thereby 
forming a small final sample that more accurately reflects ... 
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^ implications 

^ Sunita Sarawagi, Shiby Thomas, Rakesh Agrawal 

June 1998 ACM SIGMOD Record , Proceedings of the 1998 ACM SIGMOD international 

conference on Management of data SIGMOD '98, volume 27 issue 2 
Publisher: ACM Press 
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Full text available: TO pdf(2.03 MB) ~ ~ — ~^ — ^ 
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Data mining on large data warehouses is becoming increasingly Important. In support of 
this trend, we consider a spectrum of architectural alternatives for coupling mining with 
database systems. These alternatives include: loose-coupling through a SQL cursor 
interface; encapsulation of a mining algorithm in a stored procedure; caching the data to 
a file system on-the-fly and mining; tight-coupling using primarily user-defined functions; 
and SQL implementations for processing in the DBMS. We ... 

28 Stripin g in disk array RM2 enabling the tolerance of double disk failures 
Chan-Ik Park, Tae-Young Choe 

November 1996 Proceedings of the 1996 ACM/XEEE conference on Supercomputing 
(CDROM) Supercomputing '96 

Publisher: IEEE Computer Society 

Full text available: ^ pdf(188.13 KB) Additional Information: full citation , abstract , references , index terms 

There is a growing dennand in high reliability beyond what current RAID can provide and 
there are various levels of user dennand for data reliability. An efficient data placement 
schenne called RM2 has been proposed in \cite{Park95}, which makes a disk array system 
tolerable against double disk failures. In this paper, we consider how to choose an optimal 
striping unit for RM2 particularly when no workload information is available except 
read/write ratio. A disk array simulator for RM2 has bee ... 

Keywords: data placement, disk array, performance, reliability, striping 
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Publisher: IEEE Computer Society 

Full text available- ^ pdf(137 42 KB) Additional Information: full citation , abstract , references , citings , index 

terms 

Data mining is an emerging research area, whose goal is to extract significant patterns or 
interesting rules from large databases. High-level inference from large volumes of routine 
business data can provide valuable information to businesses, such as customer buying 
patterns, shelving criterion in supermarkets and stock trends. Many algorithms have been 
proposed for data mining of association rules. However,. research so far has mainly 
focused on sequential algorithms. In this paper we pres ... 

Keywords: Data Mining, Association Rules, Load Balancing, Hash Tree Balancing, 
Hashing, Shared-Memory Multi-processor 
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August 2001 The VLDB Journal — The International Journal on Very Large Data 

Bases, Volume 10 Issue 1 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^ pdfd 67.25 KB) Additional Information: full citation , abstract , citings , index terms 

Online personalization is of great interest to e-companies. Virtually all personalization 
technologies are based on the Idea of storing as much historical customer session data as 
possible, and then querying the data store as customers navigate through a web site. The 
holy grail of online personalization is an environment where fine-grained, detailed 
historical session data can be queried based on current online navigation patterns for use 
in formulating real-time responses. Unfortunately, as mo ... 

Keywords: Behavior-based personalization, Dynamic lookahead profile. Profile caching. 
Scalable online personalization, Web site and interaction model 
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conference on Management of data SIGMOD '96, volume 25 issue 2 
Publisher: ACM Press 

Full text available: S pd fd 14MB) Additional Information: full citation , abstract , references , citings , index 
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We discuss data mining based on association rules for two numeric attributes and one 
Boolean attribute. For example, in a database of bank customers, "Age" and "Balance" are 
two numeric attributes, and "CardLoan" is a Boolean attribute. Taking the pair (Age, 
Balance) as a point in two-dimensional space, we consider an association rule of the form 
{{Age, Balance) ∈ P) ⇒ {CardLoan = /es),which implies that bank customers 
whose ages and balances fall in ... 
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June 2004 Proceedings of the 9th ACM SIGMOD workshop on Research issues in data 
mining and i<nowiedge discovery DMKD '04 

Publisher: ACM Press 

Full text available: ^pdfM 48.81 KB) Additional Information: full citation , abstract , references 

Data mining has been widely recognized as a powerful tool to explore added value from 
large-scale databases. One of data mining techniques, generalized association rule mining 
, with taxonomy, is potential to discover more* useful knowledge than ordinary flat 
association rule mining by taking application specific information into account. We 
propose pattern growth mining paradigm based FP-tax algorithm, which employs a tree 
structure to compress the database. Two methods to traverse the tree ... 

Keywords: data mining, generalized association rule 



35 Mining optimized association rules for numeric attributes 
^ Takeshi Fukuda, Yasuhido Morimoto, Shinichi Morishita, Takeshi Tokuyama 
^ June 1996 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on 
Principles of database systems PODS '96 
Publisher: ACM Press 

Full text available:^ pdf(873.60 KB) Additional Information: full citation , references , citings , index terms 



36 Book reviews 

^ September 2001 intelligence, Volume 12 issue 3 
Publisher: ACM Press 
Full text available:^ pdf(85. 17 KB) 



./^^ .,r^v Additional Information: full citation , references , index terms 
html(36.51 KB) 



37 Clustering transactions using large items 
Ke Wang, Chu Xu, Sing Liu 

November 1999 Proceedings of the eighth international conference on Information 
and knowledge management CIKM '99 

Publisher: ACM Press 
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In traditional data clustering, similarity of a cluster of objects is measured by pairwise 
similarity of objects in that cluster. We argue that such measures are not appropriate for 
transactions that are sets of Items. We propose the notion of large items, i.e., items 
contained in some minimum fraction of transactions in a cluster, to measure the similarity 
of a cluster of transactions. The intuition of our clustering criterion is that there should be 
many large items withi ... 

Data streams I: Clustering binary data streams with K-means 
>^ Carlos Ordonez 

June 2003 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data 
mining and knowledge discovery DMKD '03 

Publisher: ACM Press 

Full text available: ^pdf(149.75 KB) Additional Information: full citation , abstract , references , citings 

Clustering data streams is an interesting Data Mining problem. This article presents three 
variants of the K-means algorithm to cluster binary data streams. The variants include 
On-line K-means, Scalable K-means, and Incremental K-means, a proposed variant 
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introduced that finds higher quality solutions in less time. Higher quality of solutions are 
obtained with a mean-based initialization and incremental learning. The speedup is 
achieved through a simplified set of sufficient statistics and oper ... 
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